home *** CD-ROM | disk | FTP | other *** search
- Path: hellwig.a2i!hellwig
- From: Oliver Hellwig <hellwig@rahul.net>
- Newsgroups: comp.lang.c++,comp.lang.c,comp.os.ms-windows.programmer.misc
- Subject: Re: fastest code
- Date: 11 Apr 1996 17:02:20 GMT
- Organization: a2i network
- Message-ID: <4kjdus$fiq@samba.rahul.net>
- References: <316112A2.7D37@public.sta.net.cn> <4kghs7$250@news1.mnsinc.com> <4kgu7g$n9a@solutions.solon.com> <4kh2p7INNkcq@keats.ugrad.cs.ubc.ca>
- NNTP-Posting-Host: waltz.rahul.net
- NNTP-Posting-User: hellwig
-
- In article <4kh2p7INNkcq@keats.ugrad.cs.ubc.ca>,
- Kazimir Kylheku <c2a192@ugrad.cs.ubc.ca> wrote:
- >In article <4kgu7g$n9a@solutions.solon.com>,
- >Peter Seebach <seebs@solon.com> wrote:
- > >In article <4kghs7$250@news1.mnsinc.com>,
- > >Szu-Wen Huang <huang@mnsinc.com> wrote:
- > >>Peter Seebach (seebs@solutions.solon.com) wrote:
- > >>: In article <1996Apr10.110121.6784@friend.kastle.com>,
- > >>: Richard Krehbiel <rich@kastle.com> wrote:
- > >>: >Oliver Hellwig <hellwig@rahul.net> wrote:
- > >>: >> for (i=0; i<16; i++)
- > >>: >> prom[i] = prom[i+i];
- > >
- > >>: HUH? the code as written has a clear effect, which is to shove all of
- > >>: the elements of an array over one. It certainly is an optimizer bug.
- > >
- > >>: Read the code carefully; the 2nd reference to prom[] uses a different
- > >>: index into the array. This is not a meaningless statement.
- > >
- > >>*You* read closely. The second index is "i+i". ;) Okay, so it's a
- > >>typo, but one who says "read carefully" is expected to. Cheers.
- > >
- > >Okay, so the code becomes undefined when i becomes 8.
- >
- >No it actually does not. The prom array is declared to be 32 elements wide (I
- >checked the code yesterday, the precise name is something like SA_prom), so
- >prom[16] is well defined, as is prom[30]. The purpose of the code is to
- >pack together array elements with even indices in an array of 32 elements;
- >the loop is correct.
- >
- >What happens is that these 32 bytes are read from a hardware register and
- >buffered. The code that reads them detects if they come in equivalent pairs and
- >sets a flag called ``wordlength'' accordingly. If they are in pairs, they have
- >to be squished, and the offending network card has to be sent a message to stop
- >doing that (according to my primitive understanding of the code).
- >
- >Secondly, volatile is not going to help. prom[] is not an lvalue representing a
- >memory mapped I/O region, or any storage that can unexpectedly change between
- >sequence points, but a perfectly ordinary auto variable.
- >
- > >And does nothing when i is 0.
- > >
- > >But the intervening few cases would be expected to produce
- > >assignments.
- > >
- > >I still think eliminating the assignments is a bug, and that "volatile"
- > >should not have any effect, but I'll grant that it's far
- > >from the only problem.
- >
- >Precisely. Volatile should have no effect here. The values of the array are
- >subsequently depended on by other code, so eliminating the values is incorrect.
- >
- >The compiler would have to be extremely smart to eliminate the ``compression''
- >loop and then split the subsequent code into two cases based on whether the
- >loop was or was not supposed to have happened (this depends on a single
- >variable ``wordlength'' that is set to either 1 or 2). But the original poster
- >said that the compiler kept the loop anyway, just eliminated the assignments,
- >and it's doubtful that Watcom, or any other compiler, would do this sort of
- >optimization.
-
- Well it did! I did not turn on full optimizations but I just used the
- compiler in it's default mode. However, when I turned off all optimizations
- then it compiled correctly. I verified that it left the loop but removed
- the assignment by using the watcom supplied WDISASM.EXE which converts
- object files into assembly files (the watcom compiler does not have
- a switch to output directly to assembly). I assumed that some people
- would just be able to verify this with their watcom compilers.
- However, I'm going to include my test program and the disassembly
- of that program which shows that the loop is still there.
-
-
- --------------------------file: bug.c--------------------------
- /*
- This code demonstrates an optimizer bug with watcom 10.5
-
- complile with: wcl386 bug.c
-
- The bug goes away when optimizations are disabled:
- wcl386 -od bug.c
- */
- #include <stdio.h>
-
- int
- main(void)
- {
- unsigned char prom[32];
- int i;
-
- for (i=0; i<32; i++)
- prom[i] = i;
-
- printf("Initial array:\n\t");
- for (i=0; i<16; i++)
- printf("%d ", prom[i]);
-
- printf("\n");
-
- /* A disassembly will show that nothing has changed. */
- /* This code comes from the linux ne2000 network driver, */
- /* its purpose is to compress the 32 doubled bytes down */
- /* to 16 bytes */
- for (i=0; i<16; i++)
- prom[i] = prom[i+i];
-
- printf("This array should only show even numbers:\n\t");
- /* at this point the array is unchanged */
- for (i=0; i<16; i++)
- printf("%d ", prom[i]);
-
- printf("\n");
-
- return 0;
- }
-
- ------------------------ file: bug.asm ----------------
-
- .386p
- NAME BUG
- EXTRN _cstart_ :BYTE
- EXTRN printf_ :BYTE
- EXTRN __CHK :BYTE
- DGROUP GROUP CONST,CONST2,_DATA,_BSS
- _TEXT SEGMENT BYTE PUBLIC USE32 'CODE'
- ASSUME CS:_TEXT ,DS:DGROUP,SS:DGROUP
- PUBLIC main_
- main_: push 00000030H
- call near ptr __CHK
- push edx
- sub esp,00000020H
- xor edx,edx
- L1: mov byte ptr [esp+edx],dl
- inc edx
- cmp edx,00000020H
- jl short L1
- push offset DGROUP:L5
- call near ptr printf_
- add esp,00000004H
- xor edx,edx
- L2: xor eax,eax
- mov al,byte ptr [esp+edx]
- push eax
- push offset DGROUP:L6
- call near ptr printf_
- add esp,00000008H
- inc edx
- cmp edx,00000010H
- jl short L2
- push offset DGROUP:L7
- call near ptr printf_
- add esp,00000004H
- xor edx,edx
- L3: inc edx
- cmp edx,00000010H
- jl short L3
- push offset DGROUP:L8
- call near ptr printf_
- add esp,00000004H
- xor edx,edx
- L4: xor eax,eax
- mov al,byte ptr [esp+edx]
- push eax
- push offset DGROUP:L6
- call near ptr printf_
- add esp,00000008H
- inc edx
- cmp edx,00000010H
- jl short L4
- push offset DGROUP:L7
- call near ptr printf_
- add esp,00000004H
- xor eax,eax
- add esp,00000020H
- pop edx
- ret
- _TEXT ENDS
-
- CONST SEGMENT DWORD PUBLIC USE32 'DATA'
- L5 DB 49H,6eH,69H,74H,69H,61H,6cH,20H
- DB 61H,72H,72H,61H,79H,3aH,0aH,09H
- DB 00H
- L6 DB 25H,64H,20H,00H
- L7 DB 0aH,00H
- L8 DB 54H,68H,69H,73H,20H,61H,72H,72H
- DB 61H,79H,20H,73H,68H,6fH,75H,6cH
- DB 64H,20H,6fH,6eH,6cH,79H,20H,73H
- DB 68H,6fH,77H,20H,65H,76H,65H,6eH
- DB 20H,6eH,75H,6dH,62H,65H,72H,73H
- DB 3aH,0aH,09H,00H
- CONST ENDS
-
- CONST2 SEGMENT DWORD PUBLIC USE32 'DATA'
- CONST2 ENDS
-
- _DATA SEGMENT DWORD PUBLIC USE32 'DATA'
- _DATA ENDS
-
- _BSS SEGMENT DWORD PUBLIC USE32 'BSS'
- _BSS ENDS
-
- END
-
- ---------------------end of bug.asm ------------------------
-
- When you look at the loop at L3 you will notice that the loop
- is empty! I just compiled this with "wcl386 bug.c". It's possible
- that if I were to turn on full optimizations it might also remove
- the loop. I didn't bother to try this because it didn't seem
- interesting to me. I just want it to compile correct programs!
-
- --
- Oliver Hellwig
- hellwig@rahul.net
-